Group 19: Phase 2 - Cats vs Dogs Detector (CaDoD)

Team Members

We are a group of 4 members:

Aishwarya Sinhasane - avsinhas@iu.edu (In picture, Left top)

Himanshu Joshi - hsjoshi@iu.edu (In picture, Right bottom)

Sreelaxmi Chakkadath - schakkad@iu.edu (In picture, Left bottom)

Sumitha Vellinalur Thattai - svtranga@iu.edu (In picture, Right top)

image.png

Project Abstract

The objective of our project is to classify images as either dogs or cats. Additionally, we also plan to find where the cat/dog is in the image. Although the task is simple to human eyes, computers find it hard to distinguish between images because of a plethora of factors including cluttered background, illumination conditions, deformations, occlusions among several others. We plan to build an end-to-end machine learning model which will help computers differentiate between cat and dog images with better accuracy.

In our previous phase, we finalized using Gradient boosting and linear regression as our baseline for image detection and boundary box detection respectively. In this phase, we extended the baseline and implemented a complex loss function (CXE + MSE) using homegrown linear and logistic regression.

Furthermore, we built multi-layered perceptron and calculated the accuracy and loss per epoch for classification and regression tasks. We have used data augmentation, dropout layers and regularization to overcome overfitting. In addition to this, we built a multi-headed predictor that calculates the combined loss function from classification and the regression tasks and uses it to optimize the weights and bias of the network.

Accuracy and loss are our primary evaluation parameters. In addition to this, we also used classification report (F1 score) and confusion matrix to evaluate the classification model. Using the above model, we got an accuracy of ~60% for classification and MSE close to zero for regression.

Project Meta Data

We have completed the following task this week:

Data Description

The data we plan to use is the Kaggle data set. We will be using two files – one for image and the other for boundary:

The images are taken from the cadod.tar.gz

The boundary information of the images is from cadod.csv file

Image information (cadod.tar.gz):

Attributes of the Boundary File (cadod.csv):

-15 numerical features

- This includes the image ID, the coordinates of the boundary boxes, and also the normalized coordinates of the boundary boxes 

-5 categorical features

-This gives information about the occlusion, depiction, truncation, etc. 

image.png

Task

The following are the end-to-end task to achieve the results:

Since the data set is very large, we have carried out the above tasks were carried out only in a subset of data. Hence the results will be only directional and not accurate

Import Data

Unarchive data

Load bounding box meta data

Preprocess

Rescale the images

Plot the resized and filtered images

Checkpoint and Save data

Baseline in SKLearn

image.png

Metrics for evaluation:

Loss function

Load data

Double check that it loaded correctly

Homegrown CXE + MSE

WhatsApp%20Image%202021-12-08%20at%201.06.13%20AM.jpeg

Homegrown Linear Regression

Linear Regression Loss Function

The mean square function formula is as follows : $ \text{MSE}({\mathbf{\theta}}; \mathbf{X}) = \dfrac{1}{m} \sum\limits_{i=1}^{m}{( \hat{y_i} - y_i)^2} $

where $m$ is the number of data points , $\hat{y_i}$ is the predicted value

Split data

Train

Evaluation

Mean Square Error:

Homegrown implementation of Logistic Regression

Implement a Homegrown Logistic Regression model. Extend the loss function from CXE to CXE + MSE, i.e., make it a complex multitask loss function the resulting model predicts the class and bounding box coordinates at the same time.

Combined MSE and CXE loss.

Result:

Multi Layered Perceptron

WhatsApp%20Image%202021-12-08%20at%201.04.42%20AM.jpeg

We will be using both sequential and class definition (OOP API) to build neural networks.

Image classification

Sequential neural network classifier

A similar process was followed for OOP as well

from google.colab import drive drive.mount('/content/drive')

!ls "/content/drive/My Drive/Colab Notebooks"

Training MultiLayer Perceptron

Image classification

Sequential Neural Network Model

PyTorch OOP API Neural network

Experiments

Without Data Augmentation

With Regularization and Dropout

Data Augmentation

With Data Augmentation

Without regularization and dropout

With Regularizaion and Without Dropout

With Dropout

Drop out value of 0.1

Results and Discussion

WhatsApp%20Image%202021-12-08%20at%201.17.24%20AM%281%29.jpeg

WhatsApp%20Image%202021-12-08%20at%201.15.54%20AM.jpeg

MLP Regressor for Bounding Box prediction

Sequential neural network regressor

A similar process was followed for OOP as well

from google.colab import drive drive.mount('/content/drive')

!ls "/content/drive/My Drive/Colab Notebooks"

OOP API for Regression

Results and Discussion

WhatsApp%20Image%202021-12-08%20at%201.20.29%20AM.jpeg

WhatsApp%20Image%202021-12-08%20at%201.21.21%20AM.jpeg

Multiheaded CXE + MSE Network

WhatsApp%20Image%202021-12-08%20at%201.06.30%20AM.jpeg

Training from the MultiLayer Perceptron

Image Classification

Classifier model

Regressor model

Multi headed model

Result and Discussions

WhatsApp%20Image%202021-12-08%20at%201.25.20%20AM.jpeg

Conclusion:

Our main objective was to classify images of cats and dogs and to identify the location. This fundamental problem in computer vision is the basis of many other computer vision tasks. In the phase, we used neural networks with data augmentation, drop out layers and regularization to make the prediction, which reduced overfitting the model and yielded us with an accuracy of ~60%. In addition to this, we also combined the CXE and MSE loss and then sent it back for optimization of the neural network which helped us reduce the loss steadily after each epoch.

Our next steps will be as follows: